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ABSTRACT 

In system identification, one usually cares most about finding a model whose outputs are as close 
as possible to the true system outputs when the same input is applied to both. However, most 
system identification algorithms do not minimize this output error. Often they minimize model 
equation error instead, as in typical least-squares fits using a finite-difference model, and it is 
seen here that this distinction is significant. Here, we develop a set of system identification 
algorithms that minimize output error for multi-input/multi-output and multi-input/single-output 
systems. This is done with sequential quadratic programming iterations on the nonlinear least- 
squares problems, with an eigendecomposition to handle indefinite second partials. This 
optimization minimizes a nonlinear function of many variables, and hence can converge to local 
minima. To handle this problem, we start the iterations from the OKID (Observer/Kalman 
Identification) algorithm result. Not only has OKID proved very effective in practice, it 
minimizes an output error of an observer which has the property that as the data set gets large, it 
converges to minimizing the criterion of interest here. Hence, it is a particularly good starting 
point for the nonlinear iterations here. Examples show that the methods developed here eliminate 
the bias that is often observed using any system identification methods of either over-estimating 
or under-estimating the damping of vibration modes in lightly damped structures. 
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INTRODUCTION 


There are many approaches to performing linear system identification from input-output data. 
One large class of such methods uses single-input, single-output higher order difference equation 
models as in [1], and makes a fit that minimizes the equation error. Another large class of 
identification methods create modern state variable realizations from data, as in the 
Observer/Kalman filter Identification (OKID) algorithm, see references [2,3] and its predecessors 
[4-6], This approach makes a fit that minimizes the equation error of an observer, one which 
corresponds to a Kalman filter under appropriate limiting circumstances. In both cases, the 
minimization is not done with respect to what is really of interest in system identification for 
structural analyses. What one cares about is that the model developed has the property that when 
inputs are applied to the model, it produces outputs as close as possible to the corresponding 
outputs of the true system. Hence, it is the purpose of this paper to create system identification 
algorithms that are optimized with respect to this objective, i.e. that minimize the sum of the 
squares of the output errors of the model in fitting the data. 

Optimization of this objective function is a minimization of a nonlinear function of many 
parameters. Here we use a nonlinear least- squares approach, involving sequential quadratic 
programming. Nondefiniteness in the second parti als for the model sensitivity is handled by 
eigendecomposition followed by setting negative eigenvalues of the second partials term to zero. 
Because this is a minimization in a multidimensional space, it is possible to have the algorithm 
go to a local minimum, rather than a global minimum. Thus, it is important to start with initial 
conditions for the recursive process that are as close as possible to the true solution. Here, we 
choose to apply OKID to produce the initial starting conditions. OKID has proved very effective 
in difficult spacecraft identification problems [3]. We could also choose to use results from 
related algorithms. References [7,8] modify OKID to perform residual whitening, which can be 
helpful when the data set is relatively short. References [9,10] give modified versions of OKID 
that identify the Hankel matrix directly rather than the Markov parameters that occur in it. 

OKID and these modified versions are good algorithms, which are often able to produce effective 
models in situation when other algorithms fail. The approaches developed here will enhance the 
results - these approaches start with the OKID type result, and then minimize the model output 
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error. Whether a local or a global minimum is reached, the models produced will give outputs 
that are closer to the true system model outputs in the sense of the optimization criterion. 

Here we develop algorithms to minimize the optimization criterion for five different types of 
system models: 

1. Multiple-input single-output (MISO) ARX model or finite-difference model 

2. A state variable model using an observable canonical form realization for MISO 

3. A multiple-input multiple-output (MIMO) version of (1) above 

4. A MIMO version of (2) above 

5. A MIMO state variable formulation in block diagonal form for vibration modes 

The last realization is particularly useful for structural dynamic systems or modal testing where 
one knows that the system is governed by vibration modes, and one is particularly interested in 
knowing the mode frequencies and the inode damping factors. These are being identified directly 
in this approach. 

In the following sections we first develop the mathematical formulation of the problem, and then 
give the general fonn of the nonlinear least-squares algorithm in terms of the first and second 
partials of the model outputs with respect to the model parameters. Then we address each of the 
five types of models individually, developing analytical expressions for these first and second 
partials. There are, of course, nonlinear least- squares algorithms that do not require knowledge of 
the second partials, and there are algorithms that do not require knowledge of the first partials. 
For typical system identification problems, the run times are unreasonably long for such 
algorithms that do not require explicit expressions for these partial derivatives. Hence, one of the 
contributions of this work is to set up effective computation procedures for the first and second 
partials for each of the model types above, and this results in orders of magnitude improvement 
in run time, producing reasonable and practical algorithms. The approach has already been used 
in speech coding, with good results as described in [1 1]. 

Numerical studies are reported hi the final section of this paper, which demonstrate that the 
procedures developed here give substantial improvement in system identification results. One 
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example of this is in the identification of the damping factors in lightly damped structural 
dynamic system, such as flexible spacecraft. OKID and Eigensystem Realization Algorithm [12- 
13] results very often overestimate the amount of damping in lightly damped modes. By applying 
the methods here to the OKID result, the bias in the damping is seen to disappear. 

OPTIMIZATION CRITERION AND NOTATION 

Data: Suppose that we are given experimental input and output data for N time steps, i.e. the 
input history w(0),w(l),. ..,w(/V) and the output history y*(0),y*(l),..., y*{N) . These outputs can 
be considered to differ from the true system output due to noise in the data. Algorithms will be 
developed in the following sections for various types of models. Depending on the assumed 
model order, some of the beginning data points will correspond to initial conditions for the 
model, and then the first time step of the output of the model is denoted as time step k = k 0 . Let 
Y be a column vector of the outputs starting with k = k a and going to k = N , giving all the data 
points from which the model is to match as well as possible. The model output will be denoted as 
y(k). It might be clearer to use some notation that distinguishes y(k) as a model output variable 
and not the true system output variable. However, we will not have any need to write an 
expression for the true system, a system which we never know. Hence, for notational simplicity 
y(k) is the model output. 

Parameters: Each model contains certain parameters which must be chosen in order to make the 
model fit the data as well as possible. These parameters include the coefficients in the model, but 
they can also include the initial conditions for the model for the given data set. The parameters 
are collected as a column matrix p. When we wish to show explicitly the dependence on these 
parameters, the model output y(k) will be written y(k; p) . The model output history and the 
measured history y ( k ) from k = k a to k = N are packaged as the column vector y( p) and y . 

Optimization Criterion: With these definitions, we can now conveniently write the objective 
function. We wish to minimize 


j(p) = Uy(p)-ifly(p)-f) 


(i) 
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over the parameter set p . The objective is to find the model of the chosen type that minimizes the 
Euclidean norm of the model output error in fitting the data. This represents a nonlinear least 
squares problem, and we create a sequential quadratic programming approach to minimizing 
/(p), making use of the fact that we can develop explicit expressions for both first and second 
derivative information for the change of J as a function of parameters p . 

Derivative Notation: For notational simplicity, we indicate the derivative of a quantity with 
respect to some variable by using a subscript containing a comma followed by that variable. 
Hence, J (p) represents the column vector of the scalar J differentiated with respect to the 

elements of p in order from top to bottom. The derivative of a column vector with respect to a 
vector, as in y p (p) , denotes the rectangular matrix with the first column being the derivative of 

y(p) with respect to the first element of p, the second column is the derivative of y(p) with 
respect to the second element of p, etc. The notation y tPP (k;p) indicates the Hessian matrix of 
second partial derivatives, with the i, j component being the second partial derivative with 
respect to the ith and the jth elements of p . 

TAYLOR SERIES EXPANSION OF OPTIMIZATION CRITERION 

The first and second derivatives of the objective function are written in terms of this notation as 

J, P (P) = ( 2 ) 

J.ppiP) = l p ip)l p ip) + Q(p) ( 3 ) 

where 

QiP) = 'Z[y(k;p)-y\k)] T y tPp (k;p) 


Then, the quadratic approximation of the objective function, written in terms of parameter values 
p for the expansion point, and values p + Ap at the evaluation point, becomes 
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( 4 ) 


J(P + Ap) = J(p) + J p (p)Ap + iAp T J i/v (p)Ap 

= /(p) + [y(p)-/fy^(p)Ap + |Ap r / P ip)yj,p) + Q{p) A p 

THE NONLINEAR LEAST SQUARES ALGORITHM 

Minimizing the Quadratic Expansion of the Cost: Sequential quadratic programming starts with 
an initial guess for the model parameters p, finds the change A p to minimize the quadratic 
approximation in (4), takes a step in direction A p, and repeats the procedure until convergence. 

The A p that minimizes this quadratic approximation satisfies 

l P iP)l^P) + <2( p)]ap = -y T /P)[y(p) -/] (5) 

T 

provided a finite minimum exists, i.e. provided ^{p)y_ ^{p) + Q{p) is at least positive semi- 

definite. 

Choice of Updates in Indefinite Situations: The first term y p (P)y_ p (P) can never have a 

negative eigenvalue, but it is possible for Q( p) to have negative eigenvalues. In order to address 
the case where a finite minimum does not exists, we choose to perform an eigendecomposition 
on Q(p ) , i.e. Q( p) = M~ l AM where A is the diagonal matrix of eigenvalues. Then, we replace 
any negative eigenvalue in A by zero, and call the resulting matrix A . Then let 

Q{p) = M~ l AM (6) 

and use this in place of Q{ p) in the parameter update formula (5). Note that the sum of a 
positive definite matrix and a positive semi-definite matrix is positive definite. Therefore, the 
solution of (5) for A p exists. 
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The Update Law and Step-Size Control: The general nonlinear least- squares algorithm to be 
applied to each of the five model types, can now be stated. Starting from some initial parameter 
estimates p 0 , we iteratively update the parameters according to 


P 


j + 1 


Pi~ a i 


y ( Pi)y (pj)+Q(p,) 


y Sp ,)Vy (Pi)- y*\ 


i= 0,1,2,... 


(7) 


The coefficient is a step size which in the numerical results reported below is adjusted as 
follows. Each iteration starts with a, = 1 , and the result is checked to see whether the resulting 
system is asymptotically stable, and to see if the associated cost (1) is decreased. If both 
conditions are satisfied, then the step is taken. Otherwise, the value of a t is successively 
decreased to 0.9, 0.8, ... , 0.1, 0.09, ... , 0.01, 0.009, ... , 0.001, etc. until both conditions are met. 
The user specifies a condition for termination of the iterations. If the Euclidean norm of the error 
between the model output and the data at iteration i+1, minus the same quantity at i, divided by 
the nonn of the error at iteration /, is less than a pre specified value, the iteration is stopped. 

Other Possible Algorithm Choices: Note that if the eigendecomposition is considered too time 
consuming, one could adopt various solvers that are designed to handle indefinite Hessians. It is 
also possible to avoid the eigendecomposition by using a modified Cholesky decomposition for 
the whole exact Hessian matrix. In the examples discussed below, the eigendecomposition was 
quite tractable, with quite reasonable run times, and it is of course the best, or most robust way to 
handle the non-definite second partial problem. A good text discussing the range of possible 
methods is [14]. 

What Remains to be Specified to Complete the Algorithm for each Model Type: hi order to 
complete the system identification algorithm, two things remain: 

1. One is the choice of method to obtain the starting values p 0 , discussed in the next section. 

2. Second is the development of explicit methods to evaluate the derivatives y (p) and 

y pp {k;p) for the five choices of model types. These are discussed successively in the 
sections that follow. 
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OKID INITIALIZATION OF NONLINEAR LEAST SQUARES ALGORITHM 

In this section, we review the OKID algorithm and discuss what criterion it optimizes in 
producing its identification results. It is seen that in limiting cases this criterion approaches that 
being used here, so that employing OKID to get the starting values is a very good choice — the 
OKID results should be close to the optimum we are searching for in the nonlinear least-squares 
iterations. 

Suppose that the data corresponds to a linear system of the form 

x{k + 1) = Ax( k ) + Bu{ k ) 

y(k) = Cx(k) + Du{k) ^ 


Then a linear observer takes the fonn 


v(£ + l) = Ax(k) + Bu{k) - G[y(k) - y{k)] = Ax(k) + Bv(k) 

y{k) = Cx(k) + Du(k) ^ 

where the right hand side of the difference equation contains the measurement y(k) with the 
observer gain M. The coefficient matrices are related by A = A + GC, B =[B + GD, -G], 

v(k) = [u T (k) 

There is a convolution sum version of the observer (9) 

y{k) = Du{k) + CBv{k -!)+■■ -+CA P 1 Bv(k - p) + C A P x(k - p) (10) 

where p is a time- step integer. Equation (10) can be packaged for the data set being used as 
follows 


y =YV +e+CA p X 


(ID 
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where 


y = [y (/>+i) 

y(p+ 2) • 

44]; 

e = [e r (p + 1) 

£ r (p + 2) 


X = 

[4D 42) ••• x(£ 

-/>-i)]; 

Y = | 

D CB 

CAB ■ • • 

ca p ~ 1 b\ 


u(p + 1) 

u( p + 2) • 

u(£) 


v(p) 

v(p + 1) • 

•• v(£ - 1) 

V = 

V(p- 1) 

4 p) 

•• v(£-2) 


4D 

42) • 

•• v(£-p) 


The quantity f is introduced to represent equation error when the data in y and V are 
inconsistent due to noise. 

The matrix A is an observer system matrix. In the noise free case it can be made deadbeat, and in 
the presence of noise, it can correspond to a Kalman filter. In either case it is asymptotically 
stable, and there will be a value of p for which A p is negligible. In using OKID one picks such 
a p making the last tenn of (11) negligible, and performs a least squares solution to the 
remainder of the equation, using 


Y=yV T [VV T Y l (12) 

which minimizes the error measure e T e , where £ is the difference between the y obtained using 
this model (with A p neglected) and measured values. Then from Y the Markov parameters of 
the system are computed. From these the modern control realization of the form of Eq. (8) is 
obtained by the ERA-DC algorithm [12]. This is a nonlinear computation and noise in the data 
can therefore introduce biases [16], but as noted below biases disappear under appropriate 
conditions. 

We can make the following conclusions: 
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1. OKID minimizes the optimization function of interest, the error between output of the model 
(10) and measured output, but this is not done for a model (8) of the system itself. Instead it 
is for an observer of the system. This differs from the optimization we wish to perform here 
because the previous outputs appear on the right hand side of the equation (10). 

2. If the mam point of the identification is to obtain an observer, for example for use in 
feedback control, then OKID is directly addressing the appropriate objective function. 

3. The observer is in the convolution sum form (10), and neglects tenns containing A p . Before 
an observer is used, it will most likely be put in a recursive state variable difference equation 
form (9). In this case, it is possible that using the additional nonlinear least-squares 
optimization on the state variable model coefficients could produce improved results. Getting 
from the observer Markov parameters, i.e., coefficient matrices in (10), to the minimal state 
variable model (9) involves nonlinear equations. Thus the optimization in state space form 
(8) can produce a somewhat different result than in convolution sum form (10). Note that the 
assumption of A p being negligible is not used in the process of the optimization. 

We have stated that under appropriate conditions, the observer identified by OKID converges to 
the steady state Kalman filter. The development of a Kalman filter assumes that the system 
behavior can be described by a model of the fonn 

x(k + 1) = Ax(k ) + Bu(k ) + r{k) 

(13) 

y(k) = Cx(k) + Du(k)+s{k ) v ' 

where r(k) and s(k) are zero mean, white, and Gaussianly distributed noise. Post multiply Eq. 
(11) by V T , to obtain 

yV T =Y{VV t )+ eV T +CA p XV t (14) 

The choice of the identified observer model in (12) makes the left hand side and the first tenn on 

— T 

the right match in this equation. Examine the second term on the right. The component of eV 
formed by the product of f with the row of V that is / + 1 from the bottom can be written as 
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( 15 ) 


e-p 

X £ r(/ ? + j)v r (/‘ + 7 — 1) ; i = l,...,p 

7=1 

If this term is divided by i- p, it represents the time average of the product £{k)v T (k - p-l + i) 
from k = p+\ to £. By the ergodic property, if the product is a sample function of a stationary 
random process, it can be replaced by its ensemble average provided that i goes to infinity, 

i — > oo 


E[£ r {k)v T {k- p-l + i)\ =lim— — ^e r (j)v T (j- p-l + i ) ; k>p (16) 

Then the terms remaining in Eq. ( 14) become 

E[s r (k)v T (k- 1) e,.(k)v T (k- 2) - e,.{k)v T {k - p)\ 

~ r (17) 

= -CA p E[X(k)v T (k + p- 1) X(k)v T (k + p- 2) ••• X(Ar)v r (Ar)] 

Using the assumption that A p is negligible, we conclude that the procedure indicates 

E[e r (k)v T (k -/')] = 0 for i = 1,2 

We can now make a second set of conclusions: 

1. Suppose that the real world is governed by a model of the form of Eq. (13), a finite 
dimensional linear model with additive zero mean white Gaussian plant and measurement 
noise. 

2. Suppose that A p is negligible. 

3. Suppose that the data set is sufficiently long that one can use the ergodic property to replace a 
time average by an expected value, 

4. Then the quantity f (A) is the innovations term, the difference between the measured output 
and the output predicted by the observer, which is white in a Kalman filter. 

5. The observer satisfies the equation E[£ (k)y T (k- /)] = 0; / = l....,A which is the 

orthogonality principle. When an observer satisfies this orthogonality principle, it is a 
Kalman filter [15], and the estimated output is unbiased. 
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6. A Kalman filter has the property that it minimizes the variance of the (white) error in the 
estimated output for each time step. 

7. This is equivalent to minimizing the sum of the variances (or traces of the covariances in the 
multiple output case) of these white errors. And this can be written as the time average in 
place of the expectation. 

8. This time average is the optimization criterion we wish to minimize in this paper. 

9. The OKID produces a result that satisfies the optimality condition proposed in this paper, 
when the true world is governed by a model with additive plant and measurement noise, A 11 
is negligible, and a sufficiently long data set is used such that time average to coincide with 
the ensemble average. 

10. The fact that the OKID result satisfies the optimality condition we wish to satisfy in the 
limiting situation described here, means that it is a particularly good choice as starting 
conditions for the nonlinear least squares algorithm in this paper. 

1 1. In so far as these assumptions are not met, the algorithms developed in this paper can be used 
to improve the OKID result. 

12. When the data set is short, violating the long data record assumption here, the OKID with 
residual whitening algorithm [7,8] can be of help to produce starting values that are closer to 
the optimal solution we are looking for, by more closely satisfying the orthogonality 
principle. Also, the algorithms in [9,10] can have improved satisfaction of the orthogonality 
property in short data records. Here we restrict ourselves to use of OKID itself. 

IDENTIFICATION OF MISO SYSTEMS USING AN ARX REAFIZATION 

In this section we consider multiple-input, single-output ARX models (MISO) of the form 

y{k ) = a x y{k - 1) + a 2 y{k - 2)-\ — \-a p y(k - p) 

+{3 0 u{k ) + p x u{ k - 1)+- ■ ■+/3 p u(k - p) ' ^ 

and develop the first and second partial derivatives of the output as a function of the model 

parameters, y p {k;p ) and y pp {k;p), needed in the nonlinear least squares algorithm outlined 

previously. For simplicity of notation we will often drop the parameter argument p in the 

notation y(k). The parameters of the model are the scalars a l ,a 2 ,...,a p and the row vectors 
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..,/3 p . A method to include the initial conditions into the parameter list will be discussed 
later. Now, let the requisite number of values of y( k ) and u{k) from the data be used as initial 
conditions for Eq. (18), i.e. 


y(l),y(2),...,y(p) ; u(l) ,u(2),...,u(p) (19) 

when we assume a model of order P and allow a fi () tenn (for example, when using collocated 
accelerometer outputs in structural vibrations such a term is needed). This approach is useful 
when there is no noise in its initial information, or the inaccuracies of these initial values, which 
decay over a time interval of roughly one settling time, makes negligible error when the full data 
set is considered. 

Now consider the evaluation of the components of y_ (p ) that are derivatives with respect to the 

parameter a, in p. Differentiation of Eq. (18) with respect to a v a 2 , ...,a p produces the 
following sensitivity equations that couple in one direction to ( 1 8) 


y^ (*) = «i y^ (k - 1) + a 2 .v, (k-2)+- ■ ■+a„y >ai ( k-p) + y(k - 1) 
y ^ (*) = V * 2 (k - 1) + a 2 y^(k - 2)+- ■ -+a n y ai (k -p) + y(k - 2) 
(k) = (. k - 1) + a 2 y as (, k - 2)+- ■ -+a n y^ (k - p) + y(k - 3) 


Since the mitial conditions in (19) in the present formulation come from data, their sensitivity to 
parameters of the model must be zero. This means that the initial conditions for the first equation 
in (20) are y a (1) = y a (2) =•••= y a (p) = 0 for all /. Note that the forcing function for the 

y, ttj+i (k) sensitivity equation is the same as that for the y a (k - 1) sensitivity equation except that 
it is delayed by one time step. 

The sensitivities of y(k) with respect to the /th component of the row vectors 
behave similarly, satisfying 
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y,p 0i (*) = a x y A)i (k - 1) + a 2 y Pm ( k - 2)+- • -+a n y A; (k - p) + u (A) 

3^,. (£) = oc x y Ai (k - 1) + a 2 y Ai (k - 2)+- • H-a„y A (* - yz) + u (£ - 1) 

y Al - (*) = (*-!) + (* - 2)+- • -+a n y,p 2i (k -p) + w, (k - 2 ) 


( 21 ) 


with the same statements applying to the initial conditions, and the same time shift condition 
applying to the forcing functions. 

The second partial derivatives satisfy 


y 


, (k) = ( k - 1) + a 2 y aiaj (k - 2)+- • ■+a p y aja ,{k - p) + y a ik - j) + v, aj ( k-i ) 

y,a iPjt (k) = «i)V, A (k - 1 ) + a 2 y, a ,p jt (k- 2 )+- • -+a,y,„ A (k - p) + v A (k -i) 


( 22 ) 

(23) 


Again, the initial conditions on y(k) are independent of the model, and hence their sensitivities 
with respect to the model parameters are zero, which gives zero initial conditions for these 
equations. Note that the sensitivity equations for y ,, j } (k) are homogeneous equations with zero 

initial conditions, and therefore these sensitivities are zero for all time steps. 

From equations (20) and (21), it is noted that the forcing functions were all the same except for a 
time shift. This suggests that the solutions might also be the same except for a time shift. If this 
is true, then the equations for the second partials in (22) and (23) are fed by forcing functions that 
again are all the same except for a time shift and their solutions can again be all the same with 
time shifts. When this property applies to these sensitivity equations, it is possible to obtain a 
very substantial compression in computation requirements, because one needs to solve only one 
equation for each type of sensitivity, and then obtain all the rest by shifting the time argument an 
appropriate number of steps. 

The solutions to the sensitivity (20) and (21) will all be pure time shifts of each other provided it 
is possible to set up the experiments so that the initial conditions on y(k) and u{k) in Eq. (19) 
are zero. Even when one cannot arrange that this applies to the data at hand, one may still be able 
to take advantage of this compression in computation time, and simply apply time shifted results 
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ill the computations. The differences in solutions at the start will likely be small, and will 
disappear in roughly one settling time of the system, and hence they may have insignificant 
effect on the convergence of the algorithm. 


When taking advantage of these shifting properties of the solutions, one solves the following set 
of difference equations 

y(k) = a l y(k-l) + a 2 y(k-2)+---+a p y(k-p) + /i 0 u(k) + fi l u(k-l)+---+l) p u(k-p) 
y, ai (*) = «1 (k - 1) + cc 2 y ai ( k - 2)+- • -+oc p y ai ( k-p) + y(k - 1) 

y,p 0i (k) = ocj Ai (k - 1) + a 2 y Ai ( k - 2)+- • • +a p y At (k -p) + w, (k) (24) 

3U,«, (k) = <X\y Ml a x ( k " !) + a 2 y,a iai ( k - 2)+- • -+a p y Miai ( k-p) + 2 y at (k - 1) 

y*j> (*> = (* - d + « 2>w 0i (* - 2)+- • „ l/Joi (* " P) + v Ai . (* - 1) 

Then the remaining sensitivities are obtained by the following shifting equations 

(k) = .V,„ M (k - 1) =•••= V„, (k-i + 1) 

y.ftt (k) = v A _, ( (* - 1) =-= y At (k-i) 

y,a,* J (k) = )\ ai a l (k-i-J + 2) 
y ai fi it (k) = y^ p Jk-i-j+ 1) 

together with the fact that all y^, (A) are zero. Examples of use of this algorithm are given 
below. 

IDENTIFICATION OF MISO SYSTEMS USING THE OBSERVABLE STATE SPACE 
REALIZATION 

In this section we again address MISO systems, but do so in a modem state variable formulation, 
using what might be called the left observable canonical form. We will simply call it observable 
form. The model is written as 


x(k + 1) = Ax(k) + Bu( k) 
y(k) = Cx(k) + Du(k ) 


( 26 ) 
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where 


A = 


a , 

1 

0 • 

• o' 



i 

oc 2 

0 

1 • 

• 0 

B = 

fii T 

P 2 OC 2 P 0 





a„ , 

0 

0 • 

• 1 


_P P +a P Po_ 

n—L 
_ a P 

0 

0 • 

• 0 



C = [ 1 0 0] D = fi 0 


This model is equivalent to model (18) used in the previous section. Identification results were 
identical using two programs, one for each formulation. Nevertheless, certain things are much 
easier to see using one formulation than the other, and in addition, this formulation is more 
straightforward to code. 

The first and second partial derivatives satisfy the sensitivity equations 


x p {k + 1) = Ax p (k) + A p x(k) + B p u(k) 
y, p (k ) = Cx p (k) + D u{ k ) 


x, PiPi ( k + D = Ax , PtP , (*) + A Pi x, Pi ( k ) + A P *, P S k) + B , PiPj u ( k ) 


(27) 


The matrix derivatives mvolved are very easy to calculate: 

A a . is all zero except for a one in the ith row of the first column; 

An is a zero matrix; 

B a . is all zero except for /3 0 in the ith row; 

B is all zero except for a one in the zth row and jth column; all derivatives in 

D p is a zero matrix except for the D# which is a zero row except for a one in the ith column; 


16 



B is all zero except those that involve the combinations of a i and fi 0j which are zero 
matrices except for a one in the /th row and /th column. 

A pp is a zero matrix for any combination of parameters; 

B is a zero matrix. 


The initial conditions are x{0) = x o , x p (Q) = 0, and x pip (0) = 0 . Using this form of the 
sensitivity equations, the programming of the computations becomes very straightforward. 

It is interesting to package the equations in terms of the state equation 


x^(k + l) ■■■ x (k + 1) x A (k + 1) ••• Xj,(k + l) 


= A 


x^ik) ••• x (k) x A (k) ••• X' fi (k) 


+ F 


and output equation 


y, ai (k) ■■■ y, a (k) y A (k) y A (k) ••• y p {k) 


= C 


x, ai (k) ••• x (k) x A (k) x A {k) x A (k) 


+ [() 0 u(k) 0 ••• 0] 


where 


jq (k) + Du{k) 
0 

0 


0 

x { (k) + Du{k) 
0 


0 

0 


a Y u T (k) U 1 (k) 0 

a 2 u T {k) 0 u T (k) 


x^k) + Du{k) a u T (k) 0 


0 


0 

0 

u T (k) 


As mentioned before, the algorithm of this section must produce identical results as that of the 
previous section, but is more easily programmed in a language such as MATLAB. Since matrix 
manipulations in that language are efficient, there is little incentive to make a more specialized 
program to take advantage of the number of zeros present in the matrices. Which approach is 
better computationally is perhaps language dependent and also dependent on the programming 
details. The observable canonical form here makes the j6 0 term appeal* fundamentally different, 


17 



whereas in the ARX version it appeal's like just another /3 ( . The shifting properties of the 
sensitivities are much more easily viewed by the ARX formulation. Both approaches were 
programmed with identical results. 

IDENTIFICATION OF THE INITIAF CONDITION 

The initial conditions for the model are the first p steps of the output data set. So far, we have 
treated these measured values as fixed numbers that are unrelated to the coefficients in the 
model. However, since they are measured values, they can contain noise, and one of the 
functions of the model is to smooth through the noise. Of course, the influence of small changes 
in initial values will disappear in the output in one settling time of the system, and hence are 
likely not to be important in most applications. If the system is very lightly damped or the data 
record short, one might want to let the optimization criterion apply to the initial conditions as 
well, to smooth through the noise in the initial p data points. Hence we include the initial 
conditions among the parameter vector p that are used in the optimization. This means that we 
need to develop one more set of sensitivity equations for these new parameters. The first p data 
points used to serve as initial conditions are relegated now to simply starting conditions on the 
nonlinear least squares iteration. 

The system matrices are not functions of the initial condition, so their first and second partials 
with these new parameters are zero. Coupling with the /i, is zero. In addition, second derivatives 
with respect to initial conditions are zero. The only sensitivity equations needed are 

•';,, ( o)^ + 1 ) = A -fv,(o,^) 

X ,a t Xj( 0 ) + 1 ) = / ^ X ,a i x J (0) (^) + \a X , x ; (0) (&) 

with initial conditions x l (0) (0) equal to a column matrix of zeros except for a one in the z'th 
entry, and * (0) (0) equal to zero. 
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IDENTIFICATION OF MIMO ARX MODELS 


The ARX model of Eq. (18) was for a multiple-input, single-output model which makes the 
notation more straightforward. To handle multiple-input, multiple- output models (MIMO) there 
is a range of choices of generalized ARX models that one can use. With reference to the ARX 
model of Eq. (18), the output y(k) must become column vectors of the multiple outputs at each 
time step. Denote the minimal order of the MIMO system as n, the number of the square ft, 
matrices as p, the number of inputs as r, and the number of outputs as m. 

One of the possible ARX formulation maintains the ft, as scalars. In this case they are the 
coefficients of the MIMO characteristic equation, and there are p = n of them. There are n + 1 
matrices /}, of dimension m by r. The number of unknown coefficients in the model that must be 
identified is n + (n + 1 )mr . 

Another ARX formulation allows the ft, to be mxm matrices. Then the number m of the ft, 
used must satisfy the constraint that mx p>n (this becomes obvious when the MIMO equation 
is put in observable form as in the next section). Then the number of unknowns in the ft, is pm 2 . 
The number of mxr matrices /), becomes p + 1, and the number of unknowns in them becomes 
( p + 1 )mr . So the number of coefficients in the model becomes pm 2 + ( p + 1 )mr . If the integer p 
can be chosen such that m x p = n, then pm 2 +( p + 1 )mr = n(m + r) + rnr . 

Different values of m can have a significant effect on the number of coefficients that must be 
determined. For the first case the number of coefficients in ft, is n, which is to be compared with 
pm 2 in the second case. Note that pm is greater than /?, so that the number of such coefficients is 
increased when one goes to matrix «. . On the other hand the number of /), matrices decreases 
from n + 1 to p +1, and the number of unknowns involved decreases by (n - p)mr . 

For any choice of m, the only changes needed to create an identification algorithm for the ARX 
model generalized to allow square matrix ft, and rectangular matrix /), is to rewrite the 
sensitivity equations in (18) to (25) to be sensitivities with respect to components of each matrix. 
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For example, writing the i, j component of the square coefficient matrix a, as a Uj , the sensitivity 
equations corresponding to Eq. (20) are 

y,aj k ) = a J,a tll (* - 1) + Oi 2 y atij (k- 2)+- • +a p y am (k - p) + a ( ^y{k -£) (29) 

where a ( is all zero except for a one in the zth row and /th column. The other equations are 
similar. 


IDENTIFICATION OF MIMO MODEFS IN OBSERVABFE FORM 


We can also generalize the algorithm for observable canonical form quite simply. The system 
matrices shown in Eq. (26) become 



I 

0 • 

• 0 

0*2 

o •• 

I • 

• 0 

0* m—1 

0 

0 • 

• I 

. oc m 

0 

0 • 

■ 0 


; C = [I 0 ••• 0] 


(30) 


where / is an identity matrix of dimension m x m . If the choice of p is such that pm = n , then this 
is a minimal realization, and when pm > n it is not. Since a, and /f are square and rectangular 
matrices, we need to take sensitivities with respect to each component. For example, 




0 


0 0 
0 0 


0 0 0 


(31) 


The shifting properties shown in Eq. (25) still apply for each component. 
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IDENTIFICATION OF MIMO MODEFS IN BFOCK DIAGONAF FORM 


In modal testing and in control of vibrations in structures such as spacecraft, one knows that the 
model should be composed of a set of vibration modes. This corresponds to having all 
eigenvalues of the system matrix be in complex conjugate pairs. Let us assume that the system 
matrix is diagonalizable, as is normally the case in structural dynamic systems even with 
repeated eigenvalue pairs, hi these applications, one is particularly interested in knowing the 
frequencies of the modes and their damping ratios. One can of course use any of the above 
methods for this class of problems, and then find the characteristic polynomial and its roots, or 
equivalently calculate the eigenvalues of the state matrix. Here we develop a method that finds 
these quantities directly. The method can be generalized to include real roots, but to do so one 
must address issues of knowing how many real roots to allow, or how to facilitate conversion 
from real to complex and vice versa. One also must address such issues as when the starting 
values for the nonlinear least squares fail to satisfy the conjugate pair requirement. 

Because the roots are in complex conjugate pairs, there is a realization in block diagonal form. 
This means that the output can be written as a superposition of the outputs of n 2 decoupled 
systems (overall system order is n = 2« 2 ), 

( 32 ) 

Xj(k + 1) = AjXj(k) + BjU(k) j = 1,2,.. .,n 2 


y(k ) =Cx{k)+ Du 


( 33 ) 


one for each complex conjugate eigenvalue pair A . where there are r inputs in u and m 

outputs in y, and 

x fi (k) 


.Xj{k) 


A j = 


B j = 


x j2 (k) 


' Xj co j 

r<°j a j. 

bjn bj 12 

bj2i b j22 


U Ar 

bjllr 
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and 


x(£) = [%(£) x n {k) ••• x nil (k) x^ik)] 7 


C 


d ' 


D 


d, 


ml 


d lr 


d„„. 


The nonlinear least squares optimization with respect to the parameters in the matrices A. and B j 

is performed as shown earlier after developing the appropriate sensitivity equations. It can be 
performed on the matrix level as in the observable canonical form algorithm. The optimization 
with respect to C and D can be done in a more direct way. Note that 

y, t S k ) = C c .,x(k) ; y di , (k)= D A . u(k ) 


where the coefficient matrices are all zero except for one entry that is unity, and the second 
partials are zero. Here we let the nonlinear least squares update the parameters in A ; . and B j , and 

then for the associated state history we find the pseudo inverse solution for estimated C and D 
from the observation equations (33) packaged together for all time steps k = 1,2,- • ■,£ in the data 


b(D 


V(0] = [C D] 


4D x(£) 

n(l) u (£) 


(34) 


Such a pseudo inverse finds the C and D matrices that minimize the cost (or sum of the squares 
of the output errors) for given A ; and B . 

Besides having the advantage of directly identifying the quantities of most interest, i.e., the 
frequencies and damping factors for each mode, this approach can have a dimensionality 
advantage. The number of coefficients in this model that must be chosen by optimization is 
n + nr + nm + mr which is the sum of the following: n scalars for eigenvalues, nr for matrix B, 
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run for C, and mr for D. Comparing it to the MIMO ARX model with scalar a t , this realization 
has fewer parameters if r + m < rm . 


NUMERICAL EXAMPLE 


In this section we present some numerical examples to show the effectiveness of the algorithms 
developed. First we consider the three mass system shown in Fig. 1 : 


d-A/WWVH 

w i 



Figure 1: Three mass-spring-dashpot system 
whose equations of motion can be written as 


Mw + Ew + Kw = Bu 
y = C a w + v 


where 


w = 


w. 


w. 


vt ; . 


M = 


m. 


0 


0 m 2 
0 0 


0 

0 

m 3 



C1 + C2 

-C2 

0 ' 

s = 

~C 2 

Ci+Ci 

-C 3 


0 

-c* 

1 


+ k 2 

—k 2 

0 ' 

K = 

—k 2 

k 2 + k 3 

-k 3 


0 

-k 3 

k 3 
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The three-dimensional column vector w is the inertial displacement of the three masses starting at 
the left for the top entry. We make a single-input, single-output system with the input u being a 
force applied to the right mass, and the output is an inertial acceleration measurement of the left 
mass. This means that column matrix B is zero except for one in the third entry, and row vector 
C a is unity in its left entry followed by two zeros. The mass matrix is taken as the identity 
matrix, each of the k t in the stiffness matrix K are set to 1000, and the damping matrix 5 has the 
same form as K with the k t replaced by 0.1. The v represents white zero-mean Gaussian 
measurement noise with variance equal to unity. Simulated experimental data is generated 
numerically solving these equations to produce 1000 data points with a sample time of 0.05 
seconds. The input u = u i is a white zero-mean Gaussian input also of variance equal to unity. 
Thus the measurement noise level in this data is quite high. The ARX algorithm developed above 
for MISO systems is used to produce an identification. In addition, a different program for the 
observable state space realization is applied, producing identical results. 

Table 1 gives the true values of the frequencies and the dampings for each mode. The frequencies 
are the undamped natural frequencies in Hertz (given by the magnitude of the system eigenvalues 
divided by 2k), and the damping in percent is given by 100 times the real part of the root divided 
by its magnitude. Note that the system is very lightly damped. 

OKID is used on the data set to obtain the initial identification. This algorithm has a parameter p 
which is chosen as 6 in this run. The results are given in Table 1, and we see that the third 
frequency is not very accurate, and that the dampings are all over estimated, sometimes by a very 
large margin. This over estimation of damping is a very common phenomenon in modal 
identification with algorithms such as OKID and ERA, ERA-DC. 


Table 1. Comparison of OKID-identified and ARX-optimized frequencies and dampings. 



Trae Values 

OKID Result 

Optimized Identification 

Mode No. 

Freq (Hz) 

Damping % 

Freq (Hz) 

Damping % 

Freq (Hz) 

Damping % 

1 

2.240 

0.070 

2.215 

2.537 

2.240 

0.080 

2 

6.276 

0.197 

6.274 

0.361 

6.276 

0.196 

3 

9.069 

0.285 

9.873 

12.472 

9.069 

0.270 
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We now apply the MISO ARX algorithm developed here to the data set, starting with the OKID 
result. Instead of quoting the values of the optimization criterion (1) we take the square root of it 
to produce the norm of the error. The initial value from the OKID result is 130.974. After 19 
iterations this norm has stabilized to 6 digits and the iterations are terminated. The final norm of 
the error is 31.2948. Note that the norm of the measurement noise alone is 31.430. The fact that 
this number is slightly larger than the final error level means that a small part of the noise has 
been fit in the model. It would be impossible to meaningfully get to a lower error level with 
further iterations, or by using any other algorithm. 

Examination of the identified frequencies and dampings shows that in spite of the very high 
noise level, the results are quite good. The damping for each mode is close to the true value, so 
that the optimized identification method developed here is seen to fix the persistent problem of 
overestimating the damping in modal identification. 

The first iteration used a step size of 0.5 and decreased the norm of the error to 120. The second 
iteration used a step size of 1 and reached an error norm of 80. By iteration 6 the error was 56. 
The step size was unity for all but 5 iterations with the nonunity values ranging from 0.2 to 0.7. 
One issue of importance in this class of problems is the run times to get results. When a standard 
nonlinear least- squares algorithm was used without having developed the derivative information 
as in this paper, the run times are prohibitive. The run time for this optimized identification took 
0.81823 minutes on a 200 megaHertz pentium PC using Matlab version 5.0. Thus, the 
computation times are reasonable, making the algorithm practical for real engineering 
applications. 

EXPERIMENTAL EXAMPLE 

The experimental results given in this section illustrate the usefulness of the proposed 
optimization technique in practice. The truss structure at NASA Langley Research Center is 
shown in Fig. 2. The L-shaped structure consists of nine bays on its vertical section, and one bay 
on its horizontal section, extending 90 inches and 20 inches, respectively. The shorter section is 
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clamped to a steel plate, which is rigidly attached to the wall. The square cross-section is 10 
inches by 10 inches. Two cold air jet thrusters, located at the beam tip, serve as actuators for 
excitation and control. Each thruster has a maximum thrust of 2.2 lbs. Two servo 
accelerometers located at a comer of the square cross-section provide the in-plane tip 
acceleration measurements. In addition, an offset weight of 30 pounds is added to enhance the 
dynamic coupling between the two principal axes, and to lower the structure fundamental 
frequency. For identification, the truss is excited using random inputs to both thrusters. The 
input-output signals are sampled at 250 Hz and recorded for system identification. A data record 
of 2,000 points is used for identification. 



First, OKID was used on the data set to obtain the initial identification with the parameter p = 4 
in this mn, i.e., the system order p x m = 4x2 = 8 where m is the number of outputs. Next, we 
applied the MIMO ARX algorithm and the MIMO block-diagonal form algorithm to the data set, 
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stalling with the OKID result. The initial norm from the OKID result was 715.9185. After 30 
iterations using the ARX algorithm, the final norm of the error was 78.3937. On the other hand, 
the block-diagonal form algorithm reduced the initial norm to the final value 78.86 after 30 
iterations. 

For the ARX algorithm, the first iteration used a step size of 0.2 and decreased the norm of the 
error to 585.3143. The second iteration used a step size of 0.7 and reached an error norm of 
445.5199. By iteration 10 the error was 88.829. The step size was unity for all but 3 iterations 
with the nonunity values of 0.2, 0.7, and 0.9. The run time for this optimized identification took 
8.45 minutes on a 200 megaHertz pentium PC using Matlab version 5.0. 

For the block-diagonal form algorithm, the first iteration used a step size of 0.7 and decreased the 
norm of the error to 411.4671. The second iteration used a step size of 1.0 and reached an error 
norm of 203.4215. By iteration 10 the error was 81.012. The step size was unity for all but the 
first iteration. The run time for this optimized identification took 7.23 minutes. 

Table 2 gives the OKID-identified and optimized values of the frequencies and the dampings. 
We may conclude that the first two modes are system modes, whereas the last two modes having 
very high damping ratios may be considered to be computational modes or noise modes due to 
measurement noise and/or system uncertainties. The inconsistency between the OKID result and 
the other results in damping of the fourth mode make it ambiguous whether the fourth mode is a 
system mode. 


Table 2. Comparison of OKID-identified and optimized frequencies and dampings. 



OKID result 

ARX 

Block-Diagonal Form 

Mode No. 

Freq (Hz) 

Damping % 

Freq (Hz) 

Damping % 

Freq (Hz) 

Damping % 

1 

5.906 

2.555 

5.845 

0.388 

5.845 

0.384 

2 

7.318 

1.887 

7.275 

0.457 

7.275 

0.463 

3 

9.069 

24.899 

17.534 

92.328 

25.349 

98.008 

4 

48.885 

1.751 

29.431 

32.850 

44.693 

79.945 
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Let us reset the parameter p = 40. The OKID without any singular value truncation would then 
identify a model of size 80. Among the complex modes from the OKID-identified model, let us 
choose six complex modes as shown in Table 3 to form the reduced model for optimized 
identification. Before applying any optimization algorithm, it is better to improve the reduced 
model by using the reduced state matrix A and output matrix C to update the input matrix B and 
the direct transmission matrix D, and compute the initial state vector x(0) (see Ref. [17]). The 
initial frequencies and dampings would remain unchanged in this updating process. We applied 
the MIMO block-diagonal algorithm to the data set, starting with the reduced OKID-identified 
model. The initial norm from the OKID result was 165.5505. After 17 iterations, the final norm 
of the error had stabilized to 6 digits and the value was 73.8241. The first iteration used a step 
size of 0.7 and decreased the norm of the error to 153.8757. By iteration 10 the error reduced to 
81.181 which was about half of the initial error. The step size was unity for all but 5 iterations 
with the nonunity values ranging from 0.0002 at the last iteration to 0.9 at the 14th iteration. 
Other methods also produced similar results (not shown). 

Examination of the identified frequencies and dampings shown in Table 3 indicates that in spite 
of the presence of noise and uncertainties, the OKID results are quite good when the parameter p 
is chosen quite large. The optimized identification improves the identification of the third mode 
and distinguishes the third mode from the fourth mode significantly in damping. 

Table 3. Reduced OKID and optimized frequencies and dampings. 



OKID result 

Block-Diagonal Form 

Mode No. 

Freq (Hz) 

Damping % 

Freq (Hz) 

Damping % 

1 

5.846 

0.375 

5.846 

0.379 

2 

7.279 

0.439 

7.277 

0.448 

3 

48.597 

0.481 

48.525 

0.000 

4 

46.493 

1.531 

49.316 

99.596 

5 

120.023 

3.129 

33.527 

69.338 


28 




































CONCLUSIONS 


The most common objective of system identification is to produce a model from data such that 
when the model and the real system are subjected to the same inputs, the model outputs are as 
close as possible to the true system outputs. However, most system identification algorithms do 
not make a model that fits the data in this sense of matching outputs, hi this paper we have 
developed algorithms that do minimize this objective function, and illustrate that the 
identification results fix some persistent difficulties in overestimating damping in lightly damped 
structures. Minimizing this output error requires minimizing a nonlinear function of many 
parameters. Using routine methods that are derivative free, or do not require the second 
derivative, produces prohibitive run times. Here we derive the needed derivative information for 
several classes of model realizations, and develop a nonlinear least-squares algorithm that 
handles indefinite second partials. Examples demonstrate that the algorithms fix biases in 
identification results by other methods, and that the run times for the algorithms are reasonable 
and practical. 
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